-
Notifications
You must be signed in to change notification settings - Fork 96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Device memory spill support (LRU-based only) #51
Device memory spill support (LRU-based only) #51
Conversation
@mrocklin let's move the conversation here.
We need to handle it somehow. We have to pass that information to |
Under the assumption that we'll have one worker per GPU I think that we probably don't need to divide at all. We just take the maximum available device memory of the current GPU and use that. |
I know you @mrocklin said not to add |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for doing this @pentschev I've left a few minor comments, but I suspect that we can probably agree on things quickly and merge this soon.
dask_cuda/tests/test_spill.py
Outdated
xx = x.persist() | ||
yield wait(xx) | ||
|
||
print(worker.data.device_buffer) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might want to remove the print statements for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that was accidental.
memory_limit=parse_memory_limit( | ||
memory_limit, nthreads, total_cores=nprocs | ||
), | ||
), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recommend the following instead:
data=(DeviceHostFile, {...})
Otherwise we create the data object immediately here, and then need to pass it down to the worker through a process boundary.
This is implemented here, but it looks like it's not documented anywhere (my mistake)
Also, if the user specifies device_memory_limit=0
then we might want something simpler. I can imagine wanting to turn off this behavior if things get complex.
We probably also want the same treatment in LocalCUDACluster
, though as a first pass we can also include this ourselves.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was in the process of doing that, only saw your comment now. Indeed, this is a better solution, but either way, the downside is that I had to create the work directory in dask-cuda-worker
, which I'm not particularly happy with, so if you have a suggestion on how to avoid that, it would be great!
I'm already working on the LocalCUDACluster
, I'm still not done, will soon push those.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the downside is that I had to create the work directory in dask-cuda-worker, which I'm not particularly happy with, so if you have a suggestion on how to avoid that, it would be great!
Hrm, yes I can see how that would be a concern. I don't have a good solution currently unfortunately. I'll think about it though.
@mrocklin I think this is good now, tests in place too. IMO, the only thing to be changed is how to create the worker directory if we can think of a way, but besides that, it's probably good for a review. |
Looks good to me. Thank you for putting in the effort here @pentschev . I'm looking forward to using this :) |
Thanks for the review @mrocklin! |
Based on #35, but provides only the simple LRU mechanism (i.e., no device memory monitoring).